-
Notifications
You must be signed in to change notification settings - Fork 469
perf(profiling): try to intern strings/functions #15062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf(profiling): try to intern strings/functions #15062
Conversation
|
|
f806eda to
ff3a20e
Compare
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 236 ± 2 ms. The average import time from base is: 239 ± 2 ms. The import time difference between this PR and base is: -2.53 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
ff3a20e to
f598d1c
Compare
2864f97 to
091022c
Compare
|
Based on the failing jobs, it seems like the current version of my PR makes the profiler not behave well in all forking scenarios I'll try to understand why... |
Performance SLOsComparing candidate kowalski/perf-profiling-intern-strings-into-libdatadog (ef96a13) with baseline main (c529066) 📈 Performance Regressions (1 suite)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 4.243µs (SLO: <10.000µs 📉 -57.6%) vs baseline: -0.6% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +4.5% ✅ ospathbasename_noaspectTime: ✅ 1.094µs (SLO: <10.000µs 📉 -89.1%) vs baseline: +1.1% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.8% ✅ ospathjoin_aspectTime: ✅ 6.154µs (SLO: <10.000µs 📉 -38.5%) vs baseline: ~same Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +5.4% ✅ ospathjoin_noaspectTime: ✅ 2.297µs (SLO: <10.000µs 📉 -77.0%) vs baseline: ~same Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +4.6% ✅ ospathnormcase_aspectTime: ✅ 3.863µs (SLO: <10.000µs 📉 -61.4%) vs baseline: 📈 +10.2% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.5% ✅ ospathnormcase_noaspectTime: ✅ 0.577µs (SLO: <10.000µs 📉 -94.2%) vs baseline: +0.5% Memory: ✅ 37.532MB (SLO: <39.000MB -3.8%) vs baseline: +5.0% ✅ ospathsplit_aspectTime: ✅ 4.770µs (SLO: <10.000µs 📉 -52.3%) vs baseline: -0.4% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.2% ✅ ospathsplit_noaspectTime: ✅ 1.595µs (SLO: <10.000µs 📉 -84.1%) vs baseline: +0.9% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +4.7% ✅ ospathsplitdrive_aspectTime: ✅ 3.623µs (SLO: <10.000µs 📉 -63.8%) vs baseline: -1.7% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.3% ✅ ospathsplitdrive_noaspectTime: ✅ 0.697µs (SLO: <10.000µs 📉 -93.0%) vs baseline: -1.4% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.6% ✅ ospathsplitext_aspectTime: ✅ 4.517µs (SLO: <10.000µs 📉 -54.8%) vs baseline: -1.2% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +4.5% ✅ ospathsplitext_noaspectTime: ✅ 1.385µs (SLO: <10.000µs 📉 -86.2%) vs baseline: +0.3% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +4.9% 🟡 Near SLO Breach (4 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 20.506ms (SLO: <22.300ms -8.0%) vs baseline: ~same Memory: ✅ 66.139MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.2% ✅ exception-replay-enabledTime: ✅ 1.351ms (SLO: <1.450ms -6.8%) vs baseline: +0.7% Memory: ✅ 64.346MB (SLO: <67.000MB -4.0%) vs baseline: +4.9% ✅ iastTime: ✅ 20.429ms (SLO: <22.250ms -8.2%) vs baseline: +0.2% Memory: ✅ 66.159MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.0% ✅ profilerTime: ✅ 15.548ms (SLO: <16.550ms -6.1%) vs baseline: ~same Memory: ✅ 54.004MB (SLO: <54.500MB 🟡 -0.9%) vs baseline: +4.9% ✅ resource-renamingTime: ✅ 20.514ms (SLO: <21.750ms -5.7%) vs baseline: ~same Memory: ✅ 66.168MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0% ✅ span-code-originTime: ✅ 25.345ms (SLO: <28.200ms 📉 -10.1%) vs baseline: ~same Memory: ✅ 67.227MB (SLO: <69.500MB -3.3%) vs baseline: +4.3% ✅ tracerTime: ✅ 20.456ms (SLO: <21.750ms -6.0%) vs baseline: ~same Memory: ✅ 66.153MB (SLO: <67.000MB 🟡 -1.3%) vs baseline: +5.0% ✅ tracer-and-profilerTime: ✅ 22.707ms (SLO: <23.500ms -3.4%) vs baseline: -0.2% Memory: ✅ 67.849MB (SLO: <68.000MB 🟡 -0.2%) vs baseline: +4.9% ✅ tracer-dont-create-db-spansTime: ✅ 19.291ms (SLO: <21.500ms 📉 -10.3%) vs baseline: -0.4% Memory: ✅ 66.198MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.0% ✅ tracer-minimalTime: ✅ 16.679ms (SLO: <17.500ms -4.7%) vs baseline: +0.3% Memory: ✅ 65.805MB (SLO: <67.000MB 🟡 -1.8%) vs baseline: +4.5% ✅ tracer-nativeTime: ✅ 20.497ms (SLO: <21.750ms -5.8%) vs baseline: +0.3% Memory: ✅ 67.869MB (SLO: <72.500MB -6.4%) vs baseline: +5.1% ✅ tracer-no-cachesTime: ✅ 18.460ms (SLO: <19.650ms -6.1%) vs baseline: +0.3% Memory: ✅ 66.208MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.2% ✅ tracer-no-databasesTime: ✅ 18.781ms (SLO: <20.100ms -6.6%) vs baseline: ~same Memory: ✅ 65.824MB (SLO: <67.000MB 🟡 -1.8%) vs baseline: +4.6% ✅ tracer-no-middlewareTime: ✅ 20.183ms (SLO: <21.500ms -6.1%) vs baseline: +0.2% Memory: ✅ 66.198MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.2% ✅ tracer-no-templatesTime: ✅ 20.334ms (SLO: <22.000ms -7.6%) vs baseline: ~same Memory: ✅ 66.218MB (SLO: <67.000MB 🟡 -1.2%) vs baseline: +5.1% 🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 18.020ms (SLO: <19.850ms -9.2%) vs baseline: -0.2% Memory: ✅ 66.070MB (SLO: <66.500MB 🟡 -0.6%) vs baseline: +4.6% ✅ errortracking-enabled-userTime: ✅ 18.307ms (SLO: <19.400ms -5.6%) vs baseline: +1.4% Memory: ✅ 66.092MB (SLO: <66.500MB 🟡 -0.6%) vs baseline: +4.6% ✅ tracer-enabledTime: ✅ 18.109ms (SLO: <19.450ms -6.9%) vs baseline: +0.3% Memory: ✅ 65.888MB (SLO: <66.500MB 🟡 -0.9%) vs baseline: +5.0% 🟡 errortrackingflasksqli - 6/6✅ errortracking-enabled-allTime: ✅ 2.073ms (SLO: <2.300ms -9.9%) vs baseline: ~same Memory: ✅ 52.553MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.7% ✅ errortracking-enabled-userTime: ✅ 2.097ms (SLO: <2.250ms -6.8%) vs baseline: +1.3% Memory: ✅ 52.652MB (SLO: <53.500MB 🟡 -1.6%) vs baseline: +4.8% ✅ tracer-enabledTime: ✅ 2.100ms (SLO: <2.300ms -8.7%) vs baseline: +1.6% Memory: ✅ 52.514MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.6% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 4.606ms (SLO: <4.750ms -3.0%) vs baseline: ~same Memory: ✅ 62.367MB (SLO: <65.000MB -4.1%) vs baseline: +5.0% ✅ appsec-postTime: ✅ 6.624ms (SLO: <6.750ms 🟡 -1.9%) vs baseline: ~same Memory: ✅ 62.370MB (SLO: <65.000MB -4.0%) vs baseline: +4.9% ✅ appsec-telemetryTime: ✅ 4.611ms (SLO: <4.750ms -2.9%) vs baseline: +0.7% Memory: ✅ 62.378MB (SLO: <65.000MB -4.0%) vs baseline: +5.2% ✅ debuggerTime: ✅ 1.856ms (SLO: <2.000ms -7.2%) vs baseline: -0.2% Memory: ✅ 45.319MB (SLO: <47.000MB -3.6%) vs baseline: +5.2% ✅ iast-getTime: ✅ 1.861ms (SLO: <2.000ms -6.9%) vs baseline: -0.3% Memory: ✅ 42.190MB (SLO: <49.000MB 📉 -13.9%) vs baseline: +4.8% ✅ profilerTime: ✅ 1.914ms (SLO: <2.100ms -8.8%) vs baseline: ~same Memory: ✅ 46.693MB (SLO: <47.000MB 🟡 -0.7%) vs baseline: +5.0% ✅ resource-renamingTime: ✅ 3.371ms (SLO: <3.650ms -7.6%) vs baseline: +0.3% Memory: ✅ 52.540MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.7% ✅ tracerTime: ✅ 3.360ms (SLO: <3.650ms -7.9%) vs baseline: +0.5% Memory: ✅ 52.531MB (SLO: <53.500MB 🟡 -1.8%) vs baseline: +4.7% ✅ tracer-nativeTime: ✅ 3.363ms (SLO: <3.650ms -7.9%) vs baseline: +0.5% Memory: ✅ 54.215MB (SLO: <60.000MB -9.6%) vs baseline: +5.1%
|
f925a11 to
94e2fa2
Compare
94e2fa2 to
ef96a13
Compare
Description
This PR updates Python Profiling / Stack V2 to use string interning features added by @morrisonlevi in libdatadog (DataDog/libdatadog@main...levi/phase1). This branch adds a concept of
ProfilesDictionaryin which profilers can intern strings and functions.The general ideas are the following:
StringTable), we rely what Echion gives us when it refers to strings (StringTable::Key, which really is just avoid*) and keep a map ofEchionStringId → LibDatadogStringId).std::string_viewfrom Echion'sStringTableand intern it into libdatadog. Once we have a libdatadog String ID, we can use it in our sampling logic.ddupthat abstracts awaylibdatadogconcepts – I could not leak theddog_prof_StringId2type toddupusers.string_id_tandfunction_id_ttypes (which in practice are just bothvoid*, which also is whatddog_prof_StringId2really is)string_id_t ddup_intern_string(std::string_view s);(and similarly for functions) that Stack V2 can call ot intern thingspush_labelmethods)."Thread Name","Task Name", etc.), the number of label values could be infinite (think actual Task names, Thread names, etc.). This in practice would be a memory leak.ddupand Stack V2 does not know about it (Stack V2 only deals inExportLabelKey's, which is anenum)One of the most challenging parts of this PR is getting threading and forking right:
ProfilesDictionaryitself is thread safe...ddupisn'tcall_oncewhich has a significant runtime cost inget_profiles_dictionary, so we need to carefully initialise things in the right orderProfilesDictionaryis actually a pointer to (an ARC to) a Profiles Dictionary Rust object.std::atexithook.More performance figures are available in this Notebook.
Testing
Risks
Additional Notes